Non-Sparse Regularization with Multiple Kernels

نویسندگان

Marius Kloft

Ulf Brefeld

Soeren Sonnenburg

Alexander Zien

Pavel Laskov

Motoaki Kawanabe

Vojtech Franc

Peter Gehler

Gunnar Raetsch

Peter Bartlett

چکیده

Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). It is to be expected in such cases that learning algorithms will have to deal with manipulated data aimed at hampering decision making. Although some previous work addressed the handling of malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution, we analyze the performance of a particular method – online centroid anomaly detection – in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, analysis of its efficiency and constraints. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly under different conditions: bounded and unbounded percentage of traffic, and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker’s gain) if external constraints are properly used. Our experimental evaluation carried out on real HTTP and exploit traces confirms the tightness of our theoretical bounds and practicality of our protection mechanisms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-Sparse Regularization and Efficient Training with Multiple Kernels

Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, this `1-norm MKL is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtur...

متن کامل

Variable Sparsity Kernel Learning Variable Sparsity Kernel Learning

This paper presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and lq, q ≥ 2 norm regularization for promoting non-sparse combinations across groups. Vario...

متن کامل

Variable Sparsity Kernel Learning

This paper1 presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls,s ≥ 2 norm regularization for promoting non-sparse combinations across groups. Vario...

متن کامل

CAS WAVELET METHOD FOR THE NUMERICAL SOLUTION OF BOUNDARY INTEGRAL EQUATIONS WITH LOGARITHMIC SINGULAR KERNELS

In this paper, we present a computational method for solving boundary integral equations with loga-rithmic singular kernels which occur as reformulations of a boundary value problem for the Laplacian equation. Themethod is based on the use of the Galerkin method with CAS wavelets constructed on the unit interval as basis.This approach utilizes the non-uniform Gauss-Legendre quadrature rule for ...

متن کامل

Sparsity in Multiple Kernel Learning

The problem of multiple kernel learning based on penalized empirical risk minimization is discussed. The complexity penalty is determined jointly by the empirical L2 norms and the reproducing kernel Hilbert space (RKHS) norms induced by the kernels with a data-driven choice of regularization parameters. The main focus is on the case when the total number of kernels is large, but only a relative...

متن کامل